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RECEIVING A DOCUMENT THAT IS DESCRIBED BY A 
PAGE DESCRIPTION LANGUAGE (PDL) 



DETERMINING ATABLE BOUNDING BOX FOR EACH 
TABLE IN THE DOCUMENT BASED ON THE NUMBER 

OF WORD CLUSTERS ON EACH LINE AND THE 
ALIGNMENT OF WORD CLUSTERS BETWEEN LINES 



EXPANDING THE TABLE BOUNDING BOX FOR EACH 
TABLE BASED ON THE CHANGE IN TEXT DENSITY 
BETWEEN LINES OR ON THE NUMBER OF WORD 
CLUSTERS ON ALINE 



DATA WITHIN THE TABLE BOUNDING BOX IS 
DESCRIBED IN A MARKUP LANGUAGE AND THE 
ORIGINAL TABLE STRUCTURE IS PRESERVED 



FIG. 2 




FIG. 3A 
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COMPARE CURRENT 
CLUSTER COORDINATES 
WITH SAVED CLUSTER 
COORDINATES 




J 



FIG. 3B 
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EXPANDING A FIRST BOUND (E.G., THE UPPER Y-BOUND) OF 
EACH TABLE BOUNDING BOX IN A FIRST DIRECTION 
(E.G., AN UP DIRECTION) TO THE FIRST LINE ABOVE THE 
TABLE THAT EITHER HAS A SINGLE WORD CLUSTER OR 
HAS BEEN MARKED AS AN INCREASE IN TEXT DENSITY 



EXPANDING A SECOND BOUND (E.G., THE LOWER Y-BOUND) 
OF EACH TABLE BOUNDING BOX IN ASECOND DIRECTION 
(E.G., A DOWN DIRECTION) TO THE FIRST LINE BELOW 
THE TABLE THAT EITHER HAS ASINGLE WORD CLUSTER 
OR HAS BEEN MARKED AS AN INCREASE IN TEXT DENSITY 



FIG. 4 
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RECEIVING ALIST OF WORDS AND ONE OR MORE 
TABLE BOUNDING BOXES 



DIVIDING THE LIST OF WORDS INTO A SET OF LINES 
THAT IS ORDERED BY FIRST COORDINATE 
(E.G., THE Y-COORDINATE) 



FOR EACH LINE, DIVIDING THE WORDS IN EACH LINE 
INTO CLUSTERS BASED ON THE SPACING BETWEEN 
THE WORDS 



DETERMINING IF THE X-Y COORDINATES OF EACH 
WORD CLUSTER BELONG TO ATABLE (I.E., FALL 
WITHIN ANY OF THE TABLE BOUNDING BOXES) 



DETERMINING THE ROW 
AND COLUMN OF THE 
TABLE TO WHICH THE 
WORD CLUSTER 
BELONGS 



CONVERTING THE 
WORD CLUSTER INTO 
THE MARKUP LANGUAGE 
WITHOUT ANY TABLE 
STRUCTURE 



CONVERTING THE WORD CLUSTER INTO THE 
MARKUP LANGUAGE WITH TABLE STRUCTURE 



FIG. 5 
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